{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "kxu1Gfhx1pHg"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/github/jeffheaton/app_generative_ai/blob/main/t81_559_class_10_5_chat_multimodal.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ZbNbAV281pHh"
      },
      "source": [
        "# T81-559: Applications of Generative Artificial Intelligence\n",
        "**Module 10: StreamLit**\n",
        "* Instructor: [Jeff Heaton](https://sites.wustl.edu/jeffheaton/), McKelvey School of Engineering, [Washington University in St. Louis](https://engineering.wustl.edu/Programs/Pages/default.aspx)\n",
        "* For more information visit the [class website](https://sites.wustl.edu/jeffheaton/t81-558/)."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "HwnvSYEQ1pHi"
      },
      "source": [
        "# Module 10 Material\n",
        "\n",
        "Module 10: StreamLit\n",
        "\n",
        "* Part 10.1: Running StreamLit in Google Colab [[Video]]() [[Notebook]](t81_559_class_10_1_streamlit.ipynb)\n",
        "* Part 10.2: StreamLit Introduction [[Video]]() [[Notebook]](t81_559_class_10_2_streamlit_intro.ipynb)\n",
        "* Part 10.3: Understanding Streamlit State [[Video]]() [[Notebook]](t81_559_class_10_3_streamlit_state.ipynb)\n",
        "* Part 10.4: Creating a Chat Application [[Video]]() [[Notebook]](t81_559_class_10_4_chat.ipynb)\n",
        "* **Part 10.5: MultiModal Chat Application** [[Video]]() [[Notebook]](t81_559_class_10_5_chat_multimodal.ipynb)\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "XLFov09h18yC"
      },
      "source": [
        "# Google CoLab Instructions\n",
        "\n",
        "The following code ensures that Google CoLab is running and maps Google Drive if needed."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "AWGARRT92DrA"
      },
      "outputs": [],
      "source": [
        "import os\n",
        "\n",
        "try:\n",
        "    from google.colab import drive, userdata\n",
        "    COLAB = True\n",
        "    print(\"Note: using Google CoLab\")\n",
        "except:\n",
        "    print(\"Note: not using Google CoLab\")\n",
        "    COLAB = False\n",
        "\n",
        "# OpenAI Secrets\n",
        "if COLAB:\n",
        "    os.environ[\"OPENAI_API_KEY\"] = userdata.get('OPENAI_API_KEY')\n",
        "\n",
        "# Install needed libraries in CoLab\n",
        "if COLAB:\n",
        "    !pip install langchain langchain_openai openai streamlit"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "v2MPPX0c1pHi"
      },
      "source": [
        "# Part 10.5: MultiModal Chat Application\n",
        "\n",
        "In this module, we will guide you through the process of creating a multimodal StreamLit-based LLM chat application. To keep things accessible and straightforward, we'll run our app using Google Colab. Additionally, we'll introduce you to llm_util.py, a utility script designed to make it easier to work with LangChain-compatible large language models (LLMs). For this example, we'll be using OpenAI's LLM to power our chat application. By the end of this module, you'll have a functional, interactive chat app and a solid understanding of how to integrate LLMs into your projects using StreamLit.\n",
        "\n",
        "We will now create three files:\n",
        "\n",
        "* **app.py** - The main StreamLit chat application.\n",
        "* **llm_util.py** - The LLM utility that allows us to utilize any LangChain LLM for our chat application.\n",
        "* **llms.yaml** - A config file to define which LLM's we will use; for this example it will be OpenAI.\n",
        "\n",
        "This chat application allows images to be attached to the conversation.\n",
        "\n",
        "## Chat Application\n",
        "\n",
        "To enhance your chatbot with multimodal capabilities, you need to modify your Streamlit application to accept both text and image inputs from users. Start by replacing the simple text input with a form that includes a text field and an image uploader. This allows users to type messages and optionally attach images.\n",
        "\n",
        "When a user submits the form, process the inputs by creating a message content structure that includes both the text and the image data. For the image, read the uploaded file, encode it in base64, and format it into a data URL. This prepares the image data to be sent to the language model in a compatible format.\n",
        "\n",
        "Instead of using a basic conversation chain, utilize a language model interface like ChatOpenAI that can handle messages containing both text and images. Invoke the language model with the properly formatted message, which includes the user's text and the encoded image data.\n",
        "\n",
        "Manage the conversation history using session state to maintain context across interactions. Update the chat interface to display both the user's inputs (text and images) and the assistant's responses. Ensure that the assistant's replies are rendered appropriately in the interface.\n",
        "\n",
        "By implementing these changes, your chatbot will support multimodal interactions, allowing users to engage in richer conversations that combine textual and visual information. This setup leverages Streamlit for the user interface and LangChain along with OpenAI's language models for processing, enabling the chatbot to interpret and respond to images attached by users."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "nEQ9c5Akqj_R"
      },
      "outputs": [],
      "source": [
        "%%writefile app.py\n",
        "import streamlit as st\n",
        "from openai import OpenAI\n",
        "from llm_util import *\n",
        "import base64\n",
        "from langchain_core.messages import HumanMessage\n",
        "from langchain_openai import ChatOpenAI\n",
        "import sys\n",
        "\n",
        "# This retrieves all command line arguments as a list\n",
        "arguments = sys.argv\n",
        "if len(sys.argv) != 2:\n",
        "    print(\"Please specify the llm to use as the first argument\")\n",
        "    st.stop()\n",
        "else:\n",
        "    profile = sys.argv[1]\n",
        "\n",
        "st.title(\"Chat with Image Support\")\n",
        "\n",
        "if \"chat\" not in st.session_state:\n",
        "    client = open_llm(profile)\n",
        "    st.session_state.chat = client  # Assume this returns a ChatOpenAI instance\n",
        "\n",
        "if \"messages\" not in st.session_state:\n",
        "    st.session_state.messages = []\n",
        "\n",
        "# Display existing messages\n",
        "for message in st.session_state.messages:\n",
        "    with st.chat_message(message[\"role\"]):\n",
        "        st.markdown(message[\"content\"])\n",
        "\n",
        "# Create a form for user input\n",
        "with st.form(\"chat_form\"):\n",
        "    prompt = st.text_input(\"Enter your message:\")\n",
        "    uploaded_file = st.file_uploader(\"Upload an image (optional)\", type=[\"jpg\", \"jpeg\", \"png\"])\n",
        "    submit_button = st.form_submit_button(\"Send\")\n",
        "\n",
        "if submit_button:\n",
        "    # Build the user message content\n",
        "    message_content = []\n",
        "    if prompt:\n",
        "        message_content.append({\"type\": \"text\", \"text\": prompt})\n",
        "    if uploaded_file is not None:\n",
        "        # Read the image data and encode it in base64\n",
        "        image_bytes = uploaded_file.read()\n",
        "        image_type = uploaded_file.type  # e.g., 'image/jpeg'\n",
        "        image_data = base64.b64encode(image_bytes).decode(\"utf-8\")\n",
        "        # Include the image data in the message content\n",
        "        message_content.append(\n",
        "            {\"type\": \"image_url\", \"image_url\": {\"url\": f\"data:{image_type};base64,{image_data}\"}}\n",
        "        )\n",
        "    # Create the HumanMessage\n",
        "    message = HumanMessage(content=message_content)\n",
        "    # Append user's message to session state\n",
        "    st.session_state.messages.append({\"role\": \"user\", \"content\": prompt})\n",
        "    with st.chat_message(\"user\"):\n",
        "        if prompt:\n",
        "            st.markdown(prompt)\n",
        "        if uploaded_file is not None:\n",
        "            st.image(uploaded_file)\n",
        "    # Get response from the LLM\n",
        "    response = st.session_state.chat.invoke([message])\n",
        "    # Append assistant's response to messages\n",
        "    st.session_state.messages.append({\"role\": \"assistant\", \"content\": response.content})\n",
        "    with st.chat_message(\"assistant\"):\n",
        "        st.markdown(response.content)\n",
        "\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "HSKyHUHqhCcx"
      },
      "source": [
        "## LLM Utility\n",
        "\n",
        "\n",
        "This following code enables the chat application to use various language models supported by LangChain based on a configuration file. The llm_util.py script serves as a utility that dynamically loads and initializes different language models using configurations specified in a YAML file (llms.yaml). This approach provides a flexible way to change the language model without modifying the main application code, allowing for easy experimentation and customization."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "bXml9nrGqj_R"
      },
      "outputs": [],
      "source": [
        "%%writefile llm_util.py\n",
        "import yaml\n",
        "\n",
        "# Load the YAML file\n",
        "def load_yaml(file_path):\n",
        "    with open(file_path, \"r\") as file:\n",
        "        return yaml.safe_load(file)\n",
        "\n",
        "\n",
        "# Function to dynamically import a class based on a string path (e.g., \"langchain_community.chat_models.ChatOllama\")\n",
        "def get_class(class_path):\n",
        "    module_path, class_name = class_path.rsplit(\".\", 1)\n",
        "    module = __import__(module_path, fromlist=[class_name])\n",
        "    return getattr(module, class_name)\n",
        "\n",
        "\n",
        "# Open Language Model Server function\n",
        "def open_llm(server_name):\n",
        "    config = load_yaml(\"llms.yaml\")\n",
        "    for server in config[\"servers\"]:\n",
        "        if server[\"name\"] == server_name:\n",
        "            class_path = server[\"class\"]\n",
        "            clazz = get_class(class_path)\n",
        "            # Remove 'class' and 'name' from the parameters as they're not needed for initialization\n",
        "            params = {k: v for k, v in server.items() if k not in [\"class\", \"name\"]}\n",
        "\n",
        "            return clazz(**params)\n",
        "    raise ValueError(f\"Server '{server_name}' not found\")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "wk6T0aJEqj_S"
      },
      "outputs": [],
      "source": [
        "%%writefile llms.yaml\n",
        "\n",
        "servers:\n",
        "  - name: server1\n",
        "    class: langchain_openai.ChatOpenAI\n",
        "    model: gpt-4o-mini\n",
        "    temperature: 0\n",
        "\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "eMqVgvmxhySA"
      },
      "source": [
        "Next, we obtain the password for our StreamLit server we are about to launch."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "0iJmAPR8c-JE"
      },
      "outputs": [],
      "source": [
        "!curl https://loca.lt/mytunnelpassword"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "H3lKrMcvh3_F"
      },
      "source": [
        "We launch the StreamLit server and obtain its URL. You will need the above password when you access the URL it gives you."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "YSLPrByztQOe"
      },
      "outputs": [],
      "source": [
        "!streamlit run app.py server1 &>/content/logs.txt &\n",
        "!npx --yes localtunnel --port 8501"
      ]
    }
  ],
  "metadata": {
    "anaconda-cloud": {},
    "colab": {
      "provenance": []
    },
    "kernelspec": {
      "display_name": "Python 3 (ipykernel)",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.12.4"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}